Part-of-Speech Tagging Using the Brill Method
نویسنده
چکیده
Part-of-speech tagging is the process of associating each word in a text with it’s part-of-speech category and possibly a set of morphosyntactic features. This information is represented by part-of-speech tags. This paper describes an implementation of a part-of-speech tagger for Swedish based on the Brill method. The basic idea is to apply a set of rules to an initial annotation achieved using a simple algorithm. The rules are found using transformation-based learning applied to a manually tagged training corpus. The paper also addresses the problem of tagging unknown words, i.e. words that don’t appear in the training corpus.
منابع مشابه
Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging
In this paper we describe an unsupervised learning algorithm for automatically training a rule-based part of speech tagger without using a manually tagged corpus. We compare this algorithm to the Baum-Welch algorithm, used for unsupervised training of stochastic taggers. Next, we show a method for combining unsupervised and supervised rule-based training algorithms to create a highly accurate t...
متن کاملUnsupervised Part-of-speech Tagging
Diierent approaches have been taken in order to solve the part-of-speech tagging problem. Several methods for unsupervised tagging have obtained good accuracies in practice. The approach taken by Brill Bri95] obtains results comparable to the best existing taggers. In this paper we explore the details of this unsupervised part-of-speech tagger and we present a comparison to the Xerox tagger, wh...
متن کاملسیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی
Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...
متن کاملBrill Tagging using the Micron Automata Processor
Brill tagging is a classic rule-based algorithm for part-of-speech tagging within Natural Language Processing. However, implementation of the tagger is inherently slow on conventional Von Neumann architectures. In this paper, we accelerate the second stage of Brill tagging on the Micron Automata Processor, a new computing architecture that can perform massive pattern matching in parallel. The d...
متن کاملExploring the Statistical Derivation of Transformational Rule Sequences for Part-of-Speech Tagging
Eric Brill in his recent thesis (1993b) proposed an approach called "transformation-based error-driven learning" that can statistically derive linguistic models from corpora, and he has applied the approach in various domains including part-of-speech tagging (Brill, 1992; Brill, 1994) and building phrase structure trees (Brill, 1993a). The method learns a sequence of symbolic rules that charact...
متن کامل